Active Messages Implementations for the Meiko CS-2

نویسندگان

  • Klaus E. Schauser
  • Chris J. Scheiman
چکیده

Active messages provide a low latency communication architecture which on modern parallel machines achieves more than an order of magnitude performance improvement over more traditional communication libraries. It is used by library and compiler writers to obtain the utmost performance and has been used to implement the novel parallel language Split-C. This paper discusses the experience we gained while implementing active messages on the Meiko CS-2, and discusses implementations for similar architectures. The CS-2 is an interesting experimental platform, as it resembles a cluster of Sparc workstations, each equipped with a dedicated communication co-processor. During our work we have identified two mismatches between the requirements of active message and the Meiko CS-2 architecture. First, architectures which only support efficient remote write operations (or DMA transfers as in the case of the CS-2) make it difficult to transfer both data and control as required by active messages. Traditional network interfaces avoid this problem because they have a single point of entry which essentially acts as a queue. To efficiently support active messages on modern network communication co-processors, hardware primitives are required which support this queue behavior. We overcame this problem by producing specialized code which runs on the communications co-processor and supports the active messages protocol. We also identify hardware primitives which are required to efficiently support active messages. The second mismatch is that active messages do not provide a non-blocking form of send, which is required to achieve the highest possible bandwidth while allowing the overlap of communication and computation when a communications co-processor is present. We propose to extend the current active message definition to include a non-blocking form of send. Our implementation of active messages results in a one-way latency of 12:3 s and achieves up to 39 MB/s for bulk transfers. Both numbers are close to optimal for the current Meiko hardware and are competitive with performance of active messages on other hardware platforms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Experience with active messages on the Meiko CS-2

Active messages provide a low latency communication architecture which on modern parallel machines achieves more than an order of magnitude performance improvement over more traditional communication libraries. This paper discusses the experience we gained while implementing active messages on the Meiko CS-2, and discusses implementations for similar architectures. During our work we have ident...

متن کامل

LogP Performance Assessment of Fast Network Interfaces

We present a systematic performance assessment of the hardware and software that provides the interface between applications and emerging high-speed networks. Using LogP as a conceptual framework and Active Messages as the communication layer, we devise a set of communication microbenchmarks. These generate a graphical signature from which we extract the LogP performance parameters of latency, ...

متن کامل

Exploiting the Capabilities of Communications Co-Processors

Communications co-processors (CCPs) have become commonplace in modern MPPs and networks of workstations. These co-processors provide dedicated hardware support for fast communication. In this paper we study how to exploit the capabilities of CCPs for executing user level message handlers. We show, in the context of Active Messages and Split-C, that we can move message handling code to the co-pr...

متن کامل

Performance Evaluation and Modeling of MPI Communications on the Meiko CS-2

This paper presents, evaluates and compares the performance of the point-to-point and broadcast communication primitives of the MPI-1 standard library on the Meiko CS-2 parallel machine. Furthermore, a benchmark model of MPI communications is proposed. It is based on the size of messages exchanged and the number of processors involved.

متن کامل

One Step Closer towards a Realistic Model for Parallel Computation

We present a new model of parallel computation|the LogGP model|and use it to analyze a number of algorithms, most notably, the single node scatter (one-to-all personalized broadcast). The LogGP model is an extension of the LogP model for parallel computation CKP + 93] which abstracts the communication of xed-sized short messages through the use of four parameters: the communication latency (L),...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994